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ABSTRACT 


This paper presents a statistical software package 
developed for usé@ on the Apple II Plus microcoaputer, 
modified with the Appl2 Pascal language card. The program 
addresses the following: determination of confidence 
intervals for single and bivariate populations; hypothesis 
testing for one and two parameters; computation of 
cumulative distribution values for the Normal, Student's T, 
Chi-square, F, Binomial and Poisson distributions; 
computation of guantile values for the Normal, Student's T, 
Chi-sguare and F distributions. The program also has the 
Capability to store, retrieve, and modify data for use with 
the statistical procedures. Th2 program was weitten in UCSD 
Pascal, which because of its portability indicates that 
little or no modification would b2 required to use it with 
other computers which are UCSD Pascal compatible. In 
addition, because of Pascal's block structure, the vrogram 
can be easily nodified or enhanced to include other 


Statistical procedures which are of interest to the user. 
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i CNEROPUSTION 


For the operations analyst or for that matter anyone who 
utilizes the methodology and problem solving approach of the 
operations analyst, sone form of computational device is a 
necessity. Cost, rigid interface requirements, and a number 
of other factors have in the past frustrated and stifled the 
analyst in bringing to bear the necessary computational 
power on his problem. In the late 1970's, however, Texas 
Instruments introduced a significant amount of computing 
power packaged in the programmable TI-59 calculator. 
Revolutionary is perhaps a bit t30 strong as a description 
of the impact that this and like devices have made in the 
operations research community, but certainly most analysts 
would agree that the anount of computing power that can now 
be held in one's hand and used to solve problems is indeed 
remarkable. 

In spite of this impact, however, the hand-held 
calculator's contribution might even now be waning and 
yielding to a more spectacular capability found in the 
microcomputer. The growth in the capability of these 
devices since the beginning of the industry in 1971 has been 
phenomenal. Mastrakas details this growth and gives a 
glimpse of the possible direction that the 1980's will see 
in this industry [Ref. 1]. One factor accounting for the 
growth in microcomputer technology and one that serves to 
_insure its future is the intense competition that pervades 
the industry. When one thinks of hand-held calculators, he 
thinks of Texas Instruments or perhaps Hewlett-Packard. When 
one thinks of microcomputers, he may think of Apple, Exidy, 
North Star, PEI, TRS-30, or a number of other devices with 
like capability. 





A typical microcomputer package costing between $2,000 
and $2,500 might consist of the following: the computer 
(about the size of a small portable typewriter) with 64 K 
bytes of usable random access memory, two floppy-disk drives 
(5 1/4 inch diameter) for program storage, and a black and 
white monitor for output display. A variety of programming 
languages are also available including the University of 
California, San Diego (UCSD) Pascal, BASIC, FORTRAN, PILOT, 
etc. Some of the microcomputers also have a graphics 
capability allowing tha user to visualize mathematical 
forms, plot graphs, and plot data observations. 

The American populace is being conditioned through 
current periodicals and news features to expect an 
increasingly important role for the microcomputer in 
everyday life — from grocery shopping to environmental 
control for the home. The December 1, 1980 issue of “JU. S. 
News and world Report. predicts that eigaty percent of the 
United States' households will have a microcomputer by 1990. 
In view of the prospective proliferation of these devices 
and also the significant computing power that now exists, 
the operations research analyst can 111 afford not to begin 
to exploit the capabilities of microcomputers. Indeed, the 
hardware capabilities have grown and are growing so rapidly 
that today good compatible software to support these 
impressive capabilities is seriously lacking 1o specialized 
fields such aS operations research. It is this software 
deficiency that this thesis addrasses. 





Tf. BACKGROUND 


The operations analyst can, On occasion, be required to 
establish confidence intervals or test hypotheses about an 
unknown parameter from a known or assumed population. The 
computer program written as a pact of this thesis allows the 
analyst to quickly and easily accomplish these tasks when 
observations are from Normal, Exponential or Bernoulli 


populations. 


A. DESCRIPTION OF THE MICROCOMPUTER SYSTEM 

The software development was done on an Apple [IT Plus 
microcomputer with two floppy-disk drives (5 1/4 inch 
diameter) as add-on peripherals. The system 15 equipped 
with a languag2 system giving th2 capability to use the 
University of California, San Diago (UCSD) Pascal 
programming language as well as the BASIC language which 
comes resident with the computer. The standari Apple 
computer has a forty column output display which makes it 
compatible for use with a standard television set. This 
capability can be enhanced to an 2@ighty column display with 
an additional peripheral device, provided a monitor is used 
for display in lieu of a television. The output format for 
the program is written for an eighty column display device; 
however, using the special built-in features of the Apple, 
the format can easily be made to display split screen on a 
system not equipped with an expanded display peripheral. 


Be. SOFTWARE DEVELOPMENT 

Ling and Muller give several considerations which should 
be observed in the development of software for statistical 
analysis [Ref. 2, Raa 3 Among these considerations are 
the following: 





1. Choice of Programming Language 
The Pascal programming language used in this 


software development effort has a number of features which 
make it particularly attractive for use with microcomputers. 
Pirst, Pascal is a very concise language. The compiler is 
small and conpact and fits easily within the available off- 
line storage space of the floppy-disk. Second, Pascal ls a 
high-level, general-purpose Language [Ref. 4] The 
language was originally introduced in 1971, which is recent 
in comparison to most high-level languages. [It was intended 
to be used in teaching new programmers good techniques and 
style. The Pascal language fully exploits the fundamental 
concepts of structured programming, which is 4 technique 
used by many professional programmers to write large complex 
computer programs [Ref. 4] . The use of these techniques 
facilitates developing programs ina modular fashion (i. e., 
break the overall package into logical sub-packages and 
proceed to program, debug, and validate each sub-package 
individually). The final step in the process is to combine 
the sub-packages to form the overall package. Using the 
attendant statistical package as an axample, six sub- 
packages or modules comprise the overall package. Each 
module is independent of the others and can stand alone when 
compiled with the main program. Third, the Pascal language 
performs arithmetic computations significantly faster than 
the BASIC language. Pascal is often implemented as a 
“pseudo. interpreted Language meaning that the text versions 
of programs are first compiled into a code fil2. It is 
during the compilation phase that syntax errors are detected 
by the compiler and brought to the programmer's attention. 
This code file is interpreted and executed during the 
execution phase of th2 program. Host processors can and 
typically do interpret a Pascal code file significantly 





faster than a corresponding BASIC program which performs the 
Same computations Ref. ul. 

fom Commuratilonal Efficiency 

In spite of their impressive capabilities, 

microcomputers are decidedly inferior to larger computers in 
the two key areas of cOmputational speed and accuracy. 
Pascal, as implemented on the Apple, will only perform 
computations using six decimal places of accuracy [Ref. s| 
and displays only five places past the decimal. Accordingly, 
algorithms which are us2d and work very well on larger 
computers might have no chance of producing the same results 
On microcomputers simply because the execution time 1s 


excessive, or they require double precision arithmetic. 


Cs. PORTABILITY 

A portable program is one that can be run on a number of 
different computer systems Ref. 2] . Programs written in 
the Pascal language are valuable in this regard in that they 
may be run on a variety of microcomputers without 
alteration. Some of the machines for which this is true are 
computers with the following microprocessors: 8080, 8085, 
Z80, 6502 (Appl2), 6800, and 9900. Portability is a very 
strong asset of the Pascal language even though it is 
accomplished at the expense of reduced computational speed. 
The compiled code version of programs is call32d “p-code”. 
Each machine has a special interpr2ter program which takes 
the “p-code’ and converts it to a form compatible with the 
existing host nicroprocessor. 


D. EASILY REITRIEVABLE INFORMATION PILES 

Ling suggests the use of help files to allow the user to 
make efficient use of 4a program [Ret. ap. The number of 
information files which are available and their content is 
clearly a matter of juigment based on assumptions concerning 
the knowledge of the user population about the procedures 
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used in the program. Anticipating that most of the users of 
the accompanyiag statistical package will be familiar with 
the basic concepts of the procedures themselves, only one 
help file which pertains to data entry requirements is 


included. 


fee SLSPLS USER INTERFACE 

The user 1s called upon to mak? numerical entries 
throughout the program. Because Pascal is a strongly typed 
language, a variable of type integer cannot be assigned 
floating point values. Indeed, this requirement is so 
strict that if the program expects the user to enter an 
integer and he accidently enters a number in decimal point 
notation, the program will abort and cause the entire systen 
to re-initialize. Situations such as this are, of course, 
undesirable. Moreover, all programs written with the intent 
of establishing a dialogue between the program and the user 
should be as trouble-free as possible for the user and 
Minimize as much as possible the user's chances of 
committing a fatal error when rasponding <to program proapts 
Or entering data. In general, schemes to accomplish this 
are costly in terms of computational efficiency and 
programming steps; however, usér convenience is almost 
always worth the costs. The problem with data entry or 
numerical entry from the keyboard is addressed in this 
Statistical package by making all numerical entries using 
a string and converting the string to the numerical value 
Lt represents. A string is a variable type which is a 
linear array of characters. For example, given a string 
variable called 'S', which is assigned a value of '235!, 
the character in array position ‘s [a] 2cSeeeen 's [2]: is 
13° ,#and 's[3| is '5'. Since the characters '0' through 
'9' have corresponding numerical American Standard Code 
for Information Interchange (ASCII) values of forty-eight 
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through fifty-seven, conversion 1S accomplishei by 
Subtracting forty-eight from the character's ASCII value. 
A Pascal procedure which converts strings to numbers is 
Shown in Appendix B. 

The advantages in uSing a scheme such as this for all 
numerical entries are two-fold. First, the user can take 
advantage of the direct cursor addressing available on aany 
Microcomputer video displays to correct a data entry prior 
to entering it into computer memory. This is aot possible 
when the program expects real numbers as input. Second, the 
program segment which converts the string may return either 
an integer or a real number, whichever is required by the 
program. This avoids having the user concerned with the 
typing requirements demanded by the Pascal language. 
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IESE ALGORITHMS 


Algorithms used for statistical computations on 
Microcomputers should b2 selected with the goal of providing 
the best accuracy achi2vable at th? minimum computation 
time. Given the limitation of only six decimal places of 
accuracy on the Apple microcomputer, many algorithms 
requiring more precision in their computations must be 
rejected. The algorithms remaining as candidates must be 
carefully screened to insure that their required 
computations are reasonable from a time standpoint and that 


they exercise fully the accuracy capability of the machine. 


A. CALCULATION OF VARIANCE 

To illustrate the complications stemming from reduced 
accuracy, consider the following example given by Forsythe 
[Ref. 6]. 


Find the variance of th2 following set of numbers: 
48499, 48503, 48500, 48498, 48500 
The definition for th2 variance is 
1 a 
5 = . , a 
- 1 ooae 


where s* 





= varlance 
= number of observations 
= sample mean. 


This formula can be expanded to the following form: 
2 1 N 2 
S = a, oe = NX 

oa f=] , 


Use of this formula in the Apple, however, would yield a 





variance of zero, simply because the required accuracy is 
not available for computations. The answer of zero is, of 
course, incorrect. Focsythe in his article on statistical 
computing offers the following alternative algorithm to 
compute the variance. 
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BEGIN 

READ (X)3 

DEVIATION == X - SUM; 

SUM := SUM + DEVIATION/I; 

Ses > O2 © VEVIATION F(X = SUM) ; 
END; 
Bic 


S2/ (= 1); 


3 : (¢ ’ 

This algorithm produces the correct answer, variance = 
3.5. It also illustrates how many of the limitations of 

the microcomputer can be overcome through careful selection 


of algorithms. 


Be. DISTRIBUTIONS AND [INVERSES 

The algorithms used to comput2 probability distributions 
and inverses and the source of each are listed in Appendix 
A. When compared to the tabular values listed in Dixon and 
Massey [Ref. ae, the algorithms are accurate to at least 
three decimal places in probability with the exception of 
the F distribution. fhe F distribution is accurate to three 
decimal places in probability in almost all cases; however, 
some values may differ from the Listed tabular values by as 
much as .002 in probability. Although the F quantiles may 
differ slightly from the listed tabular values, the 
probabilities corresponding to the values given by the 
algorithm are accurate to three decimal places. All of the 
algorithms produce results within three seconds except the T 
distribution. When computing for very large degrees of 
freedom for the T distribution, computation tine is a 


function of the degrees of freedom. Typically 1000 degrees 
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of freedom takes approximately eight seconds. 

Often, increased accuracy can only be gained through 
iterative techniques. This is the case with aany of the 
probability distributions and their inverses, the 
T distribution being an example. [In the algorithms shown in 
Appendix A much of the excessive computation time has been 
alleviated by combining two or more algorithms. For 
example, in the Chi-square distribution, the F distribution 
and the inverse F distribution, one algorithm is used for 
Small degrees of freedom and another for large degrees of 
freedom. Good algorithms for these distributions exist for 
large degrees of freedom which are not based on iterative 
techniques and hence are computationally fast. However, for 
small degrees of freedom, their accuracy falls off rapidly. 
Conversely, the algorithms using iterative techniques are 
very accurate at all ranges, but slow for the larger degrees 
of freedom since the number of iterations required is 
proportional to degrees of freedom. The break between small 
and large degrees of freedom is purely subjective, based on 
choosing the best combination of speed and accuracy. 
Selection of the algorithms themselves was likewise based on 


the best combination of speed and accuracy. 


C. POISSON AND BINOMIAL DISTRIBUTIONS 
Large mainframes have the capability of conputing the 
Poisson and Binomial Distributions directly from their 


definitions. 


POISSON: 
If the random variable X is distributed Poisson with 


parameter » 


then P| x ¢= k| = 


iS 


i] 





BINOMIAL: 


T£ the random variable X is distributed Binomial with 
parameters {Me P 


then P| x <= k| = = 


g=0 








a3 
p” (1-9) 


Both of thease expressions contain factorials and 
Summations; conseguently for large values of 'k' in the 
Poisson and ‘n' and '«*' in the Binomial, execution time 
Might be excessive, or the intermediate values in 
computation might exceed 11078 {the maximum number capable 
of being represented on the Appl2). A better solution for 
the calculation of these probabilites is to us2 the 
Chi-square identity for the Poisson and the F identity for 
the Binomial [aef. 8, Ref. 9]. 


POISSON: 
Given X -~ Poisson 
z 
then Pr [x <= «| = 1-% (2x + 2) Ope ) 


where 2k + 2 are the degrees of freedom 
of the Chi-square variate. 





BINOMIAL: 
Given X ~ Binomial fn, pf 
then P [x SS k| _ LF (an. Rk, Le +2) - > at 


where 2n-2k Sauer s the degrees of cocsiiien for the numerator 
and 2k+#2 equals the degrees of freadom for the denominator. 


When k = n, the probadility is 1.0. 


D. STATISTICAL ALGORITHMS 

Derivation of the algorithms used in finding confidence 
intervals and hypothesis testing is shown in Storer 
[ref. 10]. 
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IV. DESCRIPTION OF PACKAGE 


The options available to the user are shown in the block 
diagram (Figure 1). d#hen the program is executed, the user 
begins in the “outer level’. To proceed he must select fron 
one of the seven options which are shown on a menu (Figure 
Z) « 


Ae. THE DATA ENTRY MODOLE 
| Selecting option '6' from the outer level nenu (Figure 
2) will cause the data entry module menu (Figure 3) to 
appear on the screen. 
1. General 
The data entry requirements are intended to be as 
trouble-free as possible for the user. The user is prompted 
for data input by the following Line: 


Record N --> 


"'N' aS an integer sequentially updated by the 
program when the ‘return! key is pressed. Following the 
arrow, the user inputs as many data values as he wishes 
With entries Separated by one or more spaces. The only 
restriction is that he should not exceed the length of the 
display line. [he nomenclature, record, indicates only a 
logical or convenient grouping of data from the user's 
point of view. 

Prior to entering any data, the user is asked 
whether or not the observations he is entering are paired. 
Since two of the statistical procedures are predicated on 
paired data, answering 'yes' to this question will cause 
summary statistics to be computed on 'X,Y¥* pairs. These 
Summary Statistics are only good for use with the 
procedures requiring paired observations. All data entered 
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1) 
2) 
3) 
4) 
>) 
6) 
Q) 


1) 
2) 
3) 
4) 
>) 
6) 
Q) 


Hypothesis Testing One Parameter 
Hypothesis Testing Two Parameters 
Confidence Intervals Single Population 
Confidence Intervals Bivariate Populations 


Distributions and Inverses 


Data Entry 
bila 
Figure 2. The Outer Level Menu 
Instructions 
Create a new data file 


Correct/Add to existing datafile 

Enter data without storing 

Review existing datafile: 

Review summary statistics of existing file 
ult and return to outer level 


Figure 3. The Data Entry Module Menu 
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must be in 'X,¥* pairs with the 'X' observation listed 
First. All other statistical procedures and their attendant 
data entry requirements assume that all of the observations 
come from a Single population. These general cequirements 
are available to the user when he selects option '1' fron 
the data entry menu (Figure 3). 
2. Create A New Data File 

Selecting option '2' (Figure 3) will prompt the user 
to specify a file name for the data observations. After 
responding with a file name, the operating system 
establishes a directory entry at the beginning of the 
largest unused block of space containing at laast fifteen 
blocks on the specified disk. Because the filing system in 
the University of California, San Diego (UCSD) Pascal 
Language is random access, each data file entered will have 
allocated fifteen blocks of space to insure that there is 
enough room to extend the file if the user desires to do so 
at some later time. 

The floppy-disk (5 1/4 inch diameter) used by the 
Apple System provides a storage space of 280 blocks. This 
results in a capability to store saventeen data files on 
each disk. Each data file can contain a maximum of ninety 
records. Since each record can contain as many observations 
as the user desires provided that it does not exceed the 
length of the display Line, a reasonable planning figure is 
eight data observations per record. This results in an 
upper limit of approximately 720 data entries per data 
file. 

Data entry for a new file begins with 'Record 1°. 
The user's only concerns when entering data should be to 
separate each observation by one or more spaces, to not 
exceed the length of the display line, and to enter 'X,y' 


pairs if he has previously indicated paired observations. 
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Siace the data is entered as a string variable and then 
converted, it is possible to correct any entry prior to 
golag to the next record by simply using the "back arrow' 
k2y to move to the plac2 at which the correction is to be 
maiz. The Legal symbols which may be used in data entry are 
the digits '0O* through '9', the decimal point, the comma, 
th2 plus (+) and minus (-) signs, and the "E* for scientific 
notation. Commas are ignored by the procedure which 
converts the strings and are included only as a convenience 
for the user. 

Pressing the ‘return' key at the end of a record 
tecainates that record and prompts the user to input the 
next record in sequence. If the user inadvertantly enters 
an LlLlegal character while entering data, the program wiil 
aivise hin of this, indicate what the character was, and 
proapt hia to reenter the record. 

Once the user has entered all of his data, he must 
press the ‘escape key and the ‘return! key following the 
last data entry. This will close the data file. When the 
fil2 is closed, the following summary statistics are 
comouted on the observations in the file: 

ae The Sample Mean 

be. The Sample Standard Deviation (N-1) 

c. The Sum of the Observations 

d. The Sum of Squares of the Observations 

e. The Number of Observations 

For paired data, the same*statistics are computed; 
however, they are computed on the differences of the ‘X,Y! 
paics. Hence, the number of observations for a data file of 
paLced observations is exactly half of the total number of 
data entries. 


21 





The summary statistics for each data file are kept 
in a separate file that is initially established on the sane 
disk as newly created data files. The file of summary 
statistics requires one block of disk storage space. It is 
differentiated from the original file of observations by an 
'S' concatenated to the original file name. 

a t t Lsting Data File 

Selecting option '3* (Figuce 3) while in the data 
entry module will prompt the user to specify a file name. 
Once the data file is retrieved, the monitor will show the 
Mame of the file, the aumber of records in the file, and the 
Number of observations. Immediately after this information 


is the following prompt: 
Enter Record Number --> 


Selecting any record number between '1" and the 
total number of records in the file will cause the retrieval 
of that record and will display as follows: 

Old Record: 
Record 2 ==> 12384 5 


YOu may replace the complete record or 
press <RTN> for no changes. 


Record 2 -=> 


Pressing ‘return’ leaves the existing record 
unchanged and prompts the user to 2nter another record 
humber. If corrections are to be made, the us2r must enter 
the complete new recori opposite the lower prompt and press 
‘return’. The updated record will display as follows: 


Record 2 --> 12335 
Press-<RTN> if OK, <ESCD> 1f£ not. 


Pressing ‘return’ completes the update? and vrompts 
for a new record. Pressing ‘escape! produces the following 


display: 


a, 





Enter corrected data and <RTN>D 
Resord N --> 


The user then may retype the line. This sequence 
may be cepeated until the record appears as the user wants. 
Entire recocds are erased or deleted by typing a space to 
produce a biank line. If the user desires only to update an 
existing data file and not extend it, he types the number, 
'-1', 1a response to th2 prompt for record number, whicn 
closes the updated file. 

If the user selects a record number that is greater 
than the number of records in the file, he reenters the data 
entry phase beginning with the record number immediately 
following the last existing record. For 2xamnple, if there 
ace thirty-seven existing records in the file, selecting any 
number greater than thirty-seven will proiuce the followiag 
display: 


Eater new records. 
Enter <ESC> as the last entry and <RTN> to terminate input. 
Record 38 --> 


From this point, the user proceeds exactly as if he 
were in the data entry phase and terminates by pressing the 
‘escape! key immediately following the last entry. 

4. Enter Data Without Storing 

This option allows the user to compute the mean, 
Standard deviation, sum of observations, sum of squares of 
observations, and number of observations. The format for 
entering data is exactly the same as previously discussed 
for creating a new file. As indicated, data is not stored 
under this option; hence, once a record is terminated, the 
entries for that record cannot be recovered nor changed. 
This option provides an expedient way to determine the 
Sulmary statistics of a group of observations. 
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5. Review an Existing Data File 
Selecting this option allows the user to quickly 


review any or all of an existing data file in blocks of ten 
records at a time. The user is prompted to anter the file 
mame of an existing file. The program retrieves that file 
and displays the first ten records. Pressing ‘return’ at 
this point causes the next ten records to be displayed, 
etc. pressing the ‘escape!' key at the end of any display, 
returns the user to the menu for the data entry module 
(Figure 3). 
6. Review Summar tatisti Lstj tle 

All files that are stored on disk have an associated 
Summary statistics file that is created by the program when 
a newly created or updated data file is closed. This file 
contains the following information: 

a. Mean 

b. Standard Deviation 

c. Sum of Observations 

d. Sum of Squares 

e. Number of Observations 

The summary statistics fil2 only is called when 
specifying a data file to be used in the other modules 
containing the statistical procedures. Because of this, it 
is not necessary to have the original data file on-line when 
performing the statistical procedures; only the summary 
statistics fil2 is required. When using the filing system 
resident in the Apple Pascal language system to obtain 
directory listings of various disks, the summary statistics 
file is distinguished by a concatenated 'S' on the end of 
the original file name. For example, if the original file 
name was STAT:DATA1, then the corresponding summary 
statistics file is named STAT:DATA1S. Since the length of 
all summary statistics files is one block, it is possible to 


access a maximum of 27% files of summary statistics on any 
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one disk of storage. 
i. Oui t 
Pressing the 'Q* key will return the user to the 


Outer level (Figure 2). 


Be. DISTRIBUTIONS AND QUANTILES MODULE 

Selection of any of the options from the distributions 
and guantiles module menu (Pigure 4) will produce further 
prompts which require the user to enter the necessary 
information concerning values of the random variable, 
degrees of freedom, and probabilities as appropriate. 
After each computation the following prompt appears: 


C) satinue or Q)uit 


Pressing the ‘Ct key will allow the programmer to 
continue calculation in the previously selected distribution 
Oc quantile. Pressing the 'Q* key will return the user to 
the menu for this module (Figure 4). 

The Pascal language system allows the user to develop 
his own specialized libraries of often used subroutines for 
general-purpose or special-purpose computations. It is in 
such a library that the algorithms for the distributions and 
quantiles are kept. Using a special library has two najor 
advantages. First, when the user is developing the maia 
program, he is not penalized by extra compilation time for 
any of the routines in the Library. The code in the library 
is linked by a separate process [Ref. 4] . Second, the 
libcary can be used by other programs which reguire the use 
o€ the algocithms contained therein. The algorithms foc the 
distributions and quantiles fit logically in a library since 
lt is likely that other statistical packages will requira2 
their use. The Apple reference manual explains the 
procedure used to establish new libraries Ref. lie 
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Normal Distribution 

T Distribution 
Chi~Square Distribution 
F Distribution 

Binomial Distribution 
Poisson Distribution 


None Quantiles 
Quantiles 

ait: Square Quantiles 

FP Quantiles 


) uit and return to outer level 


Oo OOWTy DMNSWDYN 


Pigure 4. The Distributions and Quantiles Module Menu 


POPULATION ASSUMPTIONS 


Population Assumptions Parameter 
1) Normal u?,sigma-sqr known u 
2) Normal u & Sigma-Sgr ? _ 4u 
3) Normal u & sigma-sqr ? sigma-sq 
4) Normal u known, sigma-sqcr ? Sigma-sq 
5) Bernoulli P 
6) Exponential ATTY 
Q) ult and return to outer level 


==> mean ) 
sigma-sq ==> variance 


Figure 5. Confidence [Intervals Single Population Menu 
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GC. CONFIDENCE INTERVALS AND HYPOTHESIS TESTING MODULES 

Selecting either the confidence intervals or hypothesis 
testing options will cause the menu for that module to be 
displayed. All of these menus contain information similar 
to that shown in figure 5 for confidence intervals. 

The parameters about which an interval is to be computed 
or a hypothesis is to de made ar2 listed in the right hand 
column of the menu. The distribution from which the 
Observations came and assumptions about the populations are 
listed in columns one and two, respectively. 

1. Data Requirements 

When one of the options is selected that does not 
involve Bernoulli or Poisson populations the following 
display appears: 
eter data awk stots it 
3) Enter data w/o storing 
4) Use summary statistics 

Because of the nature of the observations, any tests 
or intervals involving Bernoulli or Poisson observations are 
entered uSing summary statistics; hence, for these cases, 
this display is skipped. 

a. Use existing Data File 

The user 1s prompted for the name 9f the data 
file. When the file name is entered, the program retrieves 
the summary statistics file associated with that file from 
disk. In the case of bivariate populations, two file names 
are needed, the first containing the 'X' observation and the 
second containing the 'Y' observation. Paired observations, 
as noted previously, are entered in one file. 

b. Enter Data and Store it 

Selecting this option will display a short 
message to the user informing him that all data storage must 
be accomplished from the data entry module (Pigure 3). The 
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user has the option at this point to entec the data witaout 
stocing it or to return to the outer level (Figure 2) and 
select the data entry module. 

c. Enter Data W/O Storing It 

Data is entered in the same format as 

previously discussed in the section pertaining to the data 
entry module. No permanent disk record is made of the 
entries. Hence, once each record of observations is 
tecminated, there is no way to retrieve it to make 
cocrections. Where bivariate populations are required, the 
user is prompted to enter all of the 'X* observations first 
and all of the 'Y* observations second. For paired 
observations all of the 'X,Y¥' pairs are entered as one 
population. 

d. Use Summary Statistics 

On all other tests or intervals, use of suamary 
statistics is optional except as previously mentioned foc 
data from Bernoulli or Poisson populations. At each 
prompt, the user is asked only for the necessary 
information to perform the statistical procedure he has 
Selected. The distinction is made in each prompt whether 
Oc not the statistics required are the sample parameter 
values (sample mean, sample standard deviation) or the true 
parameter values (true m2an, true standard deviation). 
2. Confidence Intervals 

Hhen computing confidence intervals, the user must 
supply the additional information concerning the desired 
level of confidence and the type of interval (two-sided, 
One-sided upper, one-sided Lower). Typically, computation 
of the desired interval takes one or two seconds and is 
displayed as follows: 
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95.00 Percent confidence intervals for u 


Sample Mean = 3.000 
Standard deviation = 1.581 
Upper ° 

Lower 1.036 


Another interval using same data, Y)es N)O --> 


Following the display, the user is asked whether or 
not he desires to compute another interval using the same 
data. If he responds ‘yes’, he may then vary the 
confidence level and/or the type of interval without 
having to again specify the data base. Answering ‘not will 
return the user to the menu for the module in which 
computations are currently being performed. 

Bb. 9 oi j esti 

Hypothesis tests require the user to specify the 
null hypothesis. Typically, the hypotheses involve ‘equal 
to', ‘less than or equal to', or ‘greater than or equal to! 
comparisons and are displayed for the user in a form 
Similar to the one below: 


1) u = u(l0dl 
2) u <= u (0) 
3) u >= u [0] 


The symbol [0] represents the null hypothesis 
value. The user enters this value if required by the test. 
An example display following computation is as follows: 
HEBOPHESIS: u = u (0} 
panes e e mean = 3.000 
the P-value is 0.519 
Another test using the same data, Y)es N)o -=-> 


The user is not told to accept or reject the 
hypothesis; rather, h2 1s given a p-value as shown above. 
The p-value, or probability level, is an indication of the 
level of confidence associated with the hypothesis [Ref. 
12). High p-values convey a high confidence in the null 
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hypothesis; conversely, low p-values reflect a lack of 
confidence in the validity of the hypothesis. A p-value of 
.05, for example, indicates that if the hypothesis is indeed 
true, there is only one chance in twenty that the data used 
in the test is consistent with the hypothesis. Upon 
completion of the test, the user has the option to perforn 
another test with the same data or return to the menu for 


the module in which tests are currently being performed. 
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Ve. CONCLUSTINS 


The decad2 of the 1980's promises to be particularly 
bright in terms of affording th2 operations research analyst 
easy access to computing power. Microcomputers now 
available and their more capable dascendants will doubtiess 
play an important rol2 in securing this access. However, 
equally as important ace the software packages which will 
accompany thes? computers. 

The software package described in this paper provides 
the analyst with a useful set of statistical tos0ls which can 
be used on one of the anost popular, current sicsrocomputers, 
the Apple. The five major modules contained in the package 
(confidence intervals for single and bivariate populations, 
hypothesis testing for one and two parameters, and the 
distributions and quantiles) are designed to b2 easy for the 
analyst to use and to cushion, as much as possible, 
potential user mistak2s. The algorithms used throughout the 
program were chosen on the basis of being compatible with 
the microcomputer with respect to size and computing 
precision and providing the best combination of speed and 
accuracy. 

Pascal, the programming language used, offars not only 
the advantages of portability and increased computational 
speed, but also flexibility. Pascal is flexible in that 
large complex programs are programmed in modular segments 
which are then combined into the overall program. [It 
follows that programs developed in this way ar2 easily 
enhanced by the addition of new moiules. Such is the case 
with this statistical package which could be significantly 
enhanced by the addition of a regr2ssion and an analysis 
of variance module. 
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This package and those to follow which are compatible 
With current and futur2 microcomputers can hav2 a 
significant impact in the analyst community in two key 
areas. First, the analyst can be better educated. Simply 
alleviating the tedium which accompanies the application of 
Many statistical proceiures will give fledgling analysts the 
opportunity to work more problems and be exposed to a 
greater variety of situations in the school environment. 
Perhaps of equal importance, the educational environment can 
provide the opportunity to accustom the analyst to the 
Capabilities that can and should be available for his use in 
a working environment. Second, by expanding computing power 
into areas which were not privileged to have it before, the 
educational process has a better opportunity to continue. 
Today, it is reasonable to assume that the professional 
growth of many analysts is stifled from a lack of computing 
Machinery with which to attack his problems. 

Taking full advantage of the microcomputer's hardware 
capabilities requires 2fficient compatible software. In 
Specialized areas such as operations research, the analysts 
themselves must logically provide the bulk of the effort in 
software development. The statistical package which is the 
subject of this thesis effort scarcely begins to provide the 
full complement of tools which the analyst requires. If the 
Operations research community is to take advantage ina 
timely manner of the new opportunities afforded by 
Microcomputers, effort must continue now in software 


development. 
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APPENDIX A 


FUNCTION Z@ ¢ 


PPP SESE STE SS SSSR STEELS SSE SES St SS SE SEE ES F 


* bo 4 
+ * 
x NORMAL DISTRIBUTION « 
* SOME COMMON BASIC PROGRAMS x 
* 3AD ED., P. 128 * 
* * 
* * 
Rt Ssceet_rs SLES SES SOS SSS SSS StS SS > SSS Ee SSS SS Ss | 
CONST 

C1 = 0.4361836: 

C2 = -0.1201676; 

C3 = 0.937298; 

C4 = 0.33267: 

CS = 2.5066283: 


BEGIN 

IP STDEV <= 0.0 THEN 

BEGIN 
ERROR; 
EXIT (2); 

EN 

ELSE 

BEGIN 
xr s= Xs 
XxX 3= ewe oe 8) (SIDE) 5 
XX 3:= X*X3 
R 3= EXP (~-XX/2.0) /C9S; 
X 3:= 1.0/(1.0 + Sagat > BES 
T 3= 0.5 -R*¥(CI¥*¥X #C2¥*X*X +C3*X*X*X) 5 
IF XT < MEAN THEN 

Z:=Q0.5 - T 
ELS&Z 
Go2= {+ 0.53 
END; 
END; (* Z *) 
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FUNCTION INVZ ; 
Fee a eA oF I KK ok KK KK KK 


x x 
x x 
* NORMAL erereue * 
* HANDBOOK OF MATHEMATICAL FUNCTIONS * 
% Perso 3 * 
* * 
x 
Me eke Re RC RC ie ia eR ie i Oe ie ee eo 
CONST 

Pa = 2.915917; 

C2 = 0.802853; 

C3 = 0.010328; 

D1 = 1.432788; 

D2 = 0.189269; 

D3 = 0.001308; 


VAR 
PT,T,NUM,DEN : REAL; 
BEGIN 


IF A >= 1.0) OR (P <= 0.0) THEN 
BEGIN 


) 
PT + DZ*xT*T xT: 


INVZ 
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FUNCTION T 
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PPS SPE SELES ETRE SE REELS SS SSE ES SESS SES SS 
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MM HH OH OH OH 
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ANS 
si Ce a 

T <= 
Bio Ss 

oss te 0 = ANS/2.0;3 


END; 
END; (* T *) 
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FUNCTION INVT ; 
wok ORK eK eK eK KK KO RK KK 


* x 
* * 
* T QUANTILES * 
* CACH * 
* ALGORITHM 396 * 
x x 
* x 
Pete SPSS SS SSS ESS St SS SLE SS LSS St SSS SS tS fs 


CONST 
HALFPI = 1.570796327; 


AR 
PT, DEN,A,B,C,D,X,¥Y : REAL; 
BEGIN 
IF (DF <1) OR (P >= 1.0) OR (P <= 0.0) THEN 


R 
EXIT (INV) ; 
END 
EDoe ([F P > oe THEN 


Pp 7 Le Q* (1.0 se P) 
ELSE 
P := 2.0*P; 
TF NYT s+ SORT (2.0/(B*(2.0 - P))-2-0) 
ELSE 
BEGIN 
IF DF = 1 THEN 
BEGIN 
P 3:= P¥HALFPI; 
INVE := COS (P)/3IN(P); 
END 
ELSE 
BEGIN 
B := 48.0 fAxAD: 
C 3:= 20700*A/B ~- 98.0) *A - 10 Se + 95.23.6- 
D = {198.57 (B + C) - 3.0)/5 + 1.0) *SORT(A*HALFPI) *D 
Y := XPN(X, (2.0/DF)): 
Ir’ y > 5 be @ a ine 
BEGIN 
X := INVZ(P*0.5); 
Y 3s= X¥*X; 
IF DF < 4 THEN 
eee] Cc +.053 KDE = ~ 4, a) alee + psy) 
C 3:= ot osxbe : xX - *X¥ + B+ 
Y 3= ((0.4*f + 6° 3) ey + 36.0) *Y + ) /C-Y~-3. oy 7 
wed* : 
Y := A*(Y*Y); 
IF Y >0.002 THEN 
Y := EXP(Y) - 1.0 
ELSE 
Y := 0.5*(Y*Y) + Y; 
END 
ay (1.0/ (((DP+6.0) / (DF *Y) -0.089*D-0. 822) * 
Srio. $370) 30.5) DFt4.0)) *Y-1.0) * 
DE41.0) /(DF#2.0)+1.0/7T; 
TF PT >= 0.5 THE 
INVE := SQRT(DF*Y) 


a7 





SQRT(D?*Y) ; 
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FUNCTION CHISQ 


(RK KEK KKK RK KK EK KH KK ERK K REKK AK KKK K HK 


HRHMREH HH HH 


x 


VA 


BE 


EN 


CHI-SQUARE DISTRIBUTION 
SOME COMMON BASIC PROGRAMS 
(oF <= 4Op P. 130 
HANDBOOK OF MATHEMATICAL FUNCTIONS 
DF > 40) P. 941 
we He se xe whe xe oe oe ade oie aie oe whe she she ie eae whe he ie ate ate he ke ode ake seach ate oie ee te ie ee 
R 
I : INTEGER; 
Y, POWER, TEMP,NUM,DEN,J,L,M : REAL; 
GIN 
IF (DF < 1) OR (X <= 0.0) THEN 
BEGIN 
ERROR; 
EXIT (CHISQ); 
END 
ELSE IF DF > 40 THEN 
BEGIN 
Y 3:= ((XPN(X/DF,1.0/3.0) -1.0) #2.0/(9 
Y := ¥*3;, 0% SQRT (DF) ; 
IF Y < -4.3 THE 
CHISQ c= 1.0 
ELSE IF Y > 4.3 THEN 
CHISO := 0.0 
ELSE CHISQ := Z(Y,0.0,1.0); 
END 
FLSE 
BEGIN 
DEN! s= 1.0; 
TEMP := DF: 
REPEAT 
DEN := DEN * TEMP: 
TEMP := TEMP —- 2.0; 
UNTIL TEMP < 2.0; 
POWER := (D2 + iW) DIV 2; 
NUM s:= XPN(X, POWER) *EXP(-X/2.0) /DEN; 
IF ODD (DF) THEN 
J := SORT (2.0/X/3. 1415926) 
ELSE 
gee 1.0; 
Bh 2= 1.0: 
OD w= a. 0: 
REPEAT 
DF := DF + 2; 
M s= M*X/DP; 
L := Lt": 
UNTIL mM < 0.0000001; 
L s= L-ss 
CHISO := J*NUM*L; 
END; 
D; (* CHISQ *) 
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ALGORITHM 451 


CA 


CHI-SQUARE QUANTILES 


Yv 
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0.0) OR (P >= 1.0) THEN 


REAL 
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F,PF1,F2,TEMP 
(DF <1) OR (P< 
ERROR: 
IT (INVCHI) 


BEGIN 
IF 
BEG 


VAR 
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BuNecriON F2; 
Yoke tek tok ek tok do tok tok ek KO eK KK KK a RK Ke 


x x 
* % 
* F DISTRIBUTION x 
* LARGE DEGREES OF FREEDOM x 
* EDUCATIONAL AND PSYCHOLOGICAL MEASUREMENT * 
“a Ne 25, NO. 3. P. 877-879 * 
x % 
* x 
RTE i or nf 
CONST 
C1 = 0.196854;C2 = 0.115194;C3 = 0.000344;:C4 = 
VAR 
Pe bep op lgkasgttet,K <: REAL; 
BEGIN 
IF X > 1 THEN 
BEGIN 
Ssoc=— DF; 
T := DF2; 
ue s= Xs 
END 
ELSE 
BEGIN 
S := DF2; 
Pus= DF 13 
Z s= 1.0/X; 
END; 
J s= 2.0/9.0/S; 
K := 2.0/9.0/T; 
(WE ABS( (1. 07K) #KBN(Z, (1.0/3.0) ) -1.0+9) / 
SQRT (K*XPN(Z, (2.0/3. Dele 
mr c= 1.0 + fe (CUsTR(C +Y¥* (C3+Z*C4))); 
XX := Be ae Y,%.9) 3 
IF X >= 1.0 THEN 
F2 := 1.0 - XX 
ELSE 
F2 s= XX; 
END; (* F2 *) 
FUNCTION F; 
BEGIN 
IF {ore <— 1) OR 4DF2 < 1) OR (X < 0.0) THEN 
BEGIN 
ERROB$ 
EXIT (F); 
IF { (DP 3 < 100) AND (DF2 < 100)) OR 
(DF 1 < 20) OR he < 20) THEN 
Foie= Fi (X, OF 1, DP2) 
ELSE 
F := F2 (X,DF1,DF2); 
OD; (¢ FP = 
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PUNCTION INVF ; 
He ee ee He ee ee a He oe eH a KK KK KK KK 


# # 
# * 
- e See LES * 
* BISECTION SEARCH FOR * 
* SMALL DEGREES OF FREEDOM * 
* LARGE DEGREES OF FREEDOM . * 
* HANDBOOK OF MATHEMATICAL FUNCTIONS * 
* P. 947 * 
4 * 
* * 
¥ # 


Me ee he he He ae He ae He ae ae He aie ic he aie ie Hc aie ae He eae aie ae He ae Hee oe ae oe ee a Oe oe 


CONST EPS = 0.005; 


VAR 
TEMP,PT,ENDR, ENDL,MIDPT,STEP,W,H,T,Y,Z : REAL; 
BEGIN 
IF (DP1< 1) OR (DF2 < 1) OR (PB <= 0.0) OR (P >= 1.0) 
BEGIN 
ERROR; 
EXIT (INVF) 
(DF 1 


oe ) 


1) OR (DF2 = 1) THEN 


D 
SE 
GIN 
IF 1 THEN 

= SQR(INVT((0.5*(1.0+P)),DFP2)) 

F 2= 1.0/SOR(INVT((1.0-P/2.0) , DF1)); 
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5R (TEM ENDL) <= EPS); 
- ENDL) > EPS THEN 
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APPENDIX B 


PROCEDURE CONVERTSTRINSG; 


ee RK KKK RK KEKE RAK KK KKK KEK KK KKK KX 
% ¥ 
= PROCEDURE TO CONVERT A STRING =e 
~ VARIABLE TO A NUMBER %e 


* 
We He Me He HE he he He ae ae ee Me ae He ee He oe ee HE ee He ee eK He Ke KK KK 


PABEL 13; 


VAR 
Beet NT SIGN Ps LWVEGER; 


BEGIN 
DAT 
PR 


to 

ry 

ro < b> 
eon tT) 


ll 
© ibes 


TO LENGTH (DATA) DO 


WO gg O25 & FOr 

WORE Or = 

4 ae 
li Ce 


AHVMW HOQOWOAFrNQ Ho 


Pro Wma 2M 


a 
9 
® 


16 
12' Zt tyt tt Et ez! gt ge: 
eee THEN 


RL**® PEACE + (ORD (CH)=48) ; 


t AYES |g a fe i ae | THEN 
° 

B 

a: 

L := 
LACE 3:= 10; 
E 

I 

L 

L 


1° 
(ORD (CH) -48) 


7.0 THEN iS MAX I 
2 


NTEGER * 
eS 32767 " 
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